encourage people
What will really happen when the world ends: Terrifying simulation reveals how the apocalypse will encourage people to go on KILLING sprees
Terror cops probe knife attack on train as nine fight for their lives and armed police arrest two amid'horrifying' scenes Furious leaders question why they weren't warned over dangerous levels of radiation detected at former San Francisco naval shipyard I descended to Hell for 8 hours after a suicide attempt. It's nothing like the movies... my mother prayed to every God - but only one came to save me Andrew Mountbatten Windsor'refused to sign off royal tributes to Jeffrey Epstein victims' I can't disclose my medical history to my partner. If I do... he'll find me so unsexy that he'll leave: DEAR JANE How Andrew's'rude' comment about Kate sparked bitter feud between ex-prince and William - who'couldn't wait for the day' when Charles finally threw him out Inside humiliated Andrew's new life in exile: From butlers and Downton-style splendour to a pokey cottage with a latch key, friends tell RICHARD KAY how disgraced royal will now live... and reveal who is'propping him up' For six years, I woke at 7.30am, had a shot of vodka, a line of cocaine... and Viagra before sex with the receptionist at work. Bill Maher, 69, and Al Pacino's baby mama Noor Alfallah, 31, reignite romance rumors at star-studded Halloween bash Anthony Hopkins, 87, 'puts his California estate on the market for £5.1 million' after devastating wildfires destroyed his home Pennsylvania diocese apologizes after Catholic school's Halloween float features replica of Auschwitz gate Nicki Minaj draws liberal fury by praising Donald Trump's latest move in emotional post SNL pokes fun at Trump's White House renovation with HGTV-style makeover as Miles Teller portrays Property Brothers in chaotic comedy skit Trump labels Seth Meyers a'deranged lunatic' and blasts his late-night rhetoric as'illegal' A terrifying simulation has revealed how people might really behave as the end of the world approaches. And it suggests that humanity's darkest instincts might reign supreme at the very end.
- North America > United States > Pennsylvania (0.24)
- North America > United States > California > San Francisco County > San Francisco (0.24)
- North America > Canada > Alberta (0.14)
- (18 more...)
- Media > Television (1.00)
- Media > Film (1.00)
- Leisure & Entertainment > Sports > Football (1.00)
- (3 more...)
Kelp: A Streaming Safeguard for Large Models via Latent Dynamics-Guided Risk Detection
Li, Xiaodan, Wu, Mengjie, Zhu, Yao, Lv, Yunna, Chen, YueFeng, Chen, Cen, Guo, Jianmei, Xue, Hui
Large models (LMs) are powerful content generators, yet their open-ended nature can also introduce potential risks, such as generating harmful or biased content. Existing guardrails mostly perform post-hoc detection that may expose unsafe content before it is caught, and the latency constraints further push them toward lightweight models, limiting detection accuracy. In this work, we propose Kelp, a novel plug-in framework that enables streaming risk detection within the LM generation pipeline. Kelp leverages intermediate LM hidden states through a Streaming Latent Dynamics Head (SLD), which models the temporal evolution of risk across the generated sequence for more accurate real-time risk detection. To ensure reliable streaming moderation in real applications, we introduce an Anchored Temporal Consistency (ATC) loss to enforce monotonic harm predictions by embedding a benign-then-harmful temporal prior. Besides, for a rigorous evaluation of streaming guardrails, we also present StreamGuardBench-a model-grounded benchmark featuring on-the-fly responses from each protected model, reflecting real-world streaming scenarios in both text and vision-language tasks. Across diverse models and datasets, Kelp consistently outperforms state-of-the-art post-hoc guardrails and prior plug-in probes (15.61% higher average F1), while using only 20M parameters and adding less than 0.5 ms of per-token latency.
- Law (1.00)
- Government (0.68)
- Information Technology > Security & Privacy (0.46)
JailDAM: Jailbreak Detection with Adaptive Memory for Vision-Language Model
Nian, Yi, Zhu, Shenzhe, Qin, Yuehan, Li, Li, Wang, Ziyi, Xiao, Chaowei, Zhao, Yue
Multimodal large language models (MLLMs) excel in vision-language tasks but also pose significant risks of generating harmful content, particularly through jailbreak attacks. Jailbreak attacks refer to intentional manipulations that bypass safety mechanisms in models, leading to the generation of inappropriate or unsafe content. Detecting such attacks is critical to ensuring the responsible deployment of MLLMs. Existing jailbreak detection methods face three primary challenges: (1) Many rely on model hidden states or gradients, limiting their applicability to white-box models, where the internal workings of the model are accessible; (2) They involve high computational overhead from uncertainty-based analysis, which limits real-time detection, and (3) They require fully labeled harmful datasets, which are often scarce in real-world settings. To address these issues, we introduce a test-time adaptive framework called JAILDAM. Our method leverages a memory-based approach guided by policy-driven unsafe knowledge representations, eliminating the need for explicit exposure to harmful data. By dynamically updating unsafe knowledge during test-time, our framework improves generalization to unseen jailbreak strategies while maintaining efficiency. Experiments on multiple VLM jailbreak benchmarks demonstrate that JAILDAM delivers state-of-the-art performance in harmful content detection, improving both accuracy and speed.
- North America > United States > California (0.14)
- North America > United States > Wisconsin > Dane County > Madison (0.04)
- North America > United States > Maryland (0.04)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
JailBreakV-28K: A Benchmark for Assessing the Robustness of MultiModal Large Language Models against Jailbreak Attacks
Luo, Weidi, Ma, Siyuan, Liu, Xiaogeng, Guo, Xiaoyu, Xiao, Chaowei
With the rapid advancements in Multimodal Large Language Models (MLLMs), securing these models against malicious inputs while aligning them with human values has emerged as a critical challenge. In this paper, we investigate an important and unexplored question of whether techniques that successfully jailbreak Large Language Models (LLMs) can be equally effective in jailbreaking MLLMs. To explore this issue, we introduce JailBreakV-28K, a pioneering benchmark designed to assess the transferability of LLM jailbreak techniques to MLLMs, thereby evaluating the robustness of MLLMs against diverse jailbreak attacks. Utilizing a dataset of 2, 000 malicious queries that is also proposed in this paper, we generate 20, 000 text-based jailbreak prompts using advanced jailbreak attacks on LLMs, alongside 8, 000 image-based jailbreak inputs from recent MLLMs jailbreak attacks, our comprehensive dataset includes 28, 000 test cases across a spectrum of adversarial scenarios. Our evaluation of 10 open-source MLLMs reveals a notably high Attack Success Rate (ASR) for attacks transferred from LLMs, highlighting a critical vulnerability in MLLMs that stems from their text-processing capabilities. Our findings underscore the urgent need for future research to address alignment vulnerabilities in MLLMs from both textual and visual inputs.
- North America > United States > Wisconsin > Dane County > Madison (0.04)
- North America > United States > Ohio (0.04)
- Law (1.00)
- Information Technology > Security & Privacy (1.00)
- Government (0.94)
Intelligent to a Fault: When AI Screws Up, You Might Still Be to Blame
Artificial intelligence is already making significant inroads in taking over mundane, time-consuming tasks many humans would rather not do. The responsibilities and consequences of handing over work to AI vary greatly, though; some autonomous systems recommend music or movies; others recommend sentences in court. Even more advanced AI systems will increasingly control vehicles on crowded city streets, raising questions about safety--and about liability, when the inevitable accidents occur. But philosophical arguments over AI's existential threats to humanity are often far removed from the reality of actually building and using the technology in question. Deep learning, machine vision, natural language processing--despite all that has been written and discussed about these and other aspects of artificial intelligence, AI is still at a relatively early stage in its development.
- Law (1.00)
- Transportation > Passenger (0.51)
- Transportation > Ground > Road (0.51)
'Secret sister' Facebook scam encourages people to unknowingly break the law
A Christmas Facebook scam appears to be too good true be true – and, of course, very much is. Just one of a range of malicious hoaxes appearing across the network encourages people to spend money on buying gifts with the promise that they'll get far more gifts in return. Except only one half of that actually happens – and it's the bit that involves someone taking your money. The "Secret Sister Gift Exchange" involves some variation on a message asking people to take part as a way of spreading joy. A man looks at an exhibit entitled'Mimus' a giant industrial robot which has been reprogrammed to interact with humans during a photocall at the new Design Museum in South Kensington, London Electrification Guru Dr. Wolfgang Ziebart talks about the electric Jaguar I-PACE concept SUV before it was unveiled before the Los Angeles Auto Show in Los Angeles, California, U.S The Jaguar I-PACE Concept car is the start of a new era for Jaguar.
- North America > United States > California > Los Angeles County > Los Angeles (0.76)
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.23)
- Asia > South Korea > Seoul > Seoul (0.07)
- (9 more...)
- Information Technology (1.00)
- Automobiles & Trucks > Manufacturer (1.00)
- Transportation > Ground > Road (0.50)
SoundCloud Go launches in the UK: Ads and subscriptions come to streaming service as it takes on Apple and Spotify
Nasa has announced that it has found evidence of flowing water on Mars. Scientists have long speculated that Recurring Slope Lineae -- or dark patches -- on Mars were made up of briny water but the new findings prove that those patches are caused by liquid water, which it has established by finding hydrated salts. Several hundred camped outside the London store in Covent Garden. The 6s will have new features like a vastly improved camera and a pressure-sensitive "3D Touch" display
- North America > United States > Nevada > Clark County > Las Vegas (0.05)
- North America > United States > California > San Francisco County > San Francisco (0.05)
- Europe > United Kingdom > England > Kent (0.05)
- Media > Music (1.00)
- Leisure & Entertainment > Games > Computer Games (0.70)
- Information Technology > Artificial Intelligence (0.70)
- Information Technology > Communications > Mobile (0.31)
- Information Technology > Human Computer Interaction > Interfaces (0.30)
- Information Technology > Communications > Social Media (0.30)